Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[b/331681978] Add Short URL matching to KaggleFileResolver #1375

Merged
merged 9 commits into from
Apr 2, 2024
Merged

Conversation

lucyhe
Copy link
Contributor

@lucyhe lucyhe commented Mar 27, 2024

This change is required to unblock https://github.com/Kaggle/kaggleazure/pull/28591, which updates the Kernel MT to only use short URLs.

Testing
Added unit test and ran ./test

@lucyhe lucyhe marked this pull request as ready for review March 29, 2024 19:54
@lucyhe lucyhe marked this pull request as draft March 29, 2024 21:18
@lucyhe lucyhe marked this pull request as ready for review March 29, 2024 23:01
self.assertEqual([1, 1], layer(test_inputs).shape)
# Delete the files that were created in KaggleJwtHandler's do_POST method
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed or else os.makedirs and os.symlink would fail when the new test tried to recreate the same files. (Alternative solution was to use a different model URL for the new test, but I thought it was cleaner to do the proper cleanup after each test.)

@@ -4,18 +4,19 @@

from tensorflow_hub import resolver

url_pattern = re.compile(r"https?://([a-z]+\.)?kaggle.com/models/(?P<owner>[^\\/]+)/(?P<model>[^\\/]+)/frameworks/(?P<framework>[^\\/]+)/variations/(?P<variation>[^\\/]+)/versions/(?P<version>[0-9]+)$")
short_url_pattern = re.compile(r"https?://([a-z]+\.)?kaggle.com/models/(?P<owner>[^\\/]+)/(?P<model>[^\\/]+)/(?P<framework>[^\\/]+)/(?P<variation>[^\\/]+)/(?P<version>[0-9]+)$")
long_url_pattern = re.compile(r"https?://([a-z]+\.)?kaggle.com/models/(?P<owner>[^\\/]+)/(?P<model>[^\\/]+)/frameworks/(?P<framework>[^\\/]+)/variations/(?P<variation>[^\\/]+)/versions/(?P<version>[0-9]+)$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering if we should load these via an ENV var that we can mutate, but I suppose it is really rare that we mutate this, and it might be fine if only new images can support changes like this going forward.

@@ -4,18 +4,19 @@

from tensorflow_hub import resolver

url_pattern = re.compile(r"https?://([a-z]+\.)?kaggle.com/models/(?P<owner>[^\\/]+)/(?P<model>[^\\/]+)/frameworks/(?P<framework>[^\\/]+)/variations/(?P<variation>[^\\/]+)/versions/(?P<version>[0-9]+)$")
short_url_pattern = re.compile(r"https?://([a-z]+\.)?kaggle.com/models/(?P<owner>[^\\/]+)/(?P<model>[^\\/]+)/(?P<framework>[^\\/]+)/(?P<variation>[^\\/]+)/(?P<version>[0-9]+)$")
long_url_pattern = re.compile(r"https?://([a-z]+\.)?kaggle.com/models/(?P<owner>[^\\/]+)/(?P<model>[^\\/]+)/frameworks/(?P<framework>[^\\/]+)/variations/(?P<variation>[^\\/]+)/versions/(?P<version>[0-9]+)$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a unified version of the two regex's if we wanted to still only need one:

https?://([a-z]+\.)?kaggle.com/models/(?P<owner>[^\\/]+)/(?P<model>[^\\/]+)(/frameworks)?/(?P<framework>[^\\/]+)(/variations)?/(?P<variation>[^\\/]+)(/versions)?/(?P<version>[0-9]+)$

https://regex101.com/r/JbWJNI/1

All I did was making the "extra" parts of the path optional, though I suppose this does make each part independently optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind if we keep short_url_pattern and long_url_pattern? I lean towards that because it feels more readable and avoids the "independently optional parts" issue you mention

@lucyhe lucyhe merged commit 5afea0a into main Apr 2, 2024
4 checks passed
@lucyhe lucyhe deleted the lh/short branch April 2, 2024 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants